BIJUNG:11.1.2 환경 모델링의 정의: 전이 함수(Transition Function)와 보상 함수(Reward Function)의 학습